# Import the required libraries and dependencies
import pandas as pd
import hvplot.pandas
from pathlib import Path
import geoviews
# Using the read_csv function and Path module, create a DataFrame
csvpath=Path("./Resources/sfo_neighborhoods_census_data.csv")
sfo_data_df = pd.read_csv(csvpath)
# Review the first and last five rows of the DataFrame
sfo_data_df
| year | neighborhood | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|---|
| 0 | 2010 | Alamo Square | 291.182945 | 372560 | 1239 |
| 1 | 2010 | Anza Vista | 267.932583 | 372560 | 1239 |
| 2 | 2010 | Bayview | 170.098665 | 372560 | 1239 |
| 3 | 2010 | Buena Vista Park | 347.394919 | 372560 | 1239 |
| 4 | 2010 | Central Richmond | 319.027623 | 372560 | 1239 |
| ... | ... | ... | ... | ... | ... |
| 392 | 2016 | Telegraph Hill | 903.049771 | 384242 | 4390 |
| 393 | 2016 | Twin Peaks | 970.085470 | 384242 | 4390 |
| 394 | 2016 | Van Ness/ Civic Center | 552.602567 | 384242 | 4390 |
| 395 | 2016 | Visitacion Valley | 328.319007 | 384242 | 4390 |
| 396 | 2016 | Westwood Park | 631.195426 | 384242 | 4390 |
397 rows × 5 columns
groupby function to group the data by year. Aggregate the results by the mean of the groups.¶# Create a numerical aggregation that groups the data by the year and then averages the results.
sfo_data_grouped = sfo_data_df.groupby('year').mean()
# Review the DataFrame
sfo_data_grouped
| sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|
| year | |||
| 2010 | 369.344353 | 372560.0 | 1239.0 |
| 2011 | 341.903429 | 374507.0 | 1530.0 |
| 2012 | 399.389968 | 376454.0 | 2324.0 |
| 2013 | 483.600304 | 378401.0 | 2971.0 |
| 2014 | 556.277273 | 380348.0 | 3528.0 |
| 2015 | 632.540352 | 382295.0 | 3739.0 |
| 2016 | 697.643709 | 384242.0 | 4390.0 |
# Create a visual aggregation explore the housing units by year
sfo_data_grouped.hvplot.bar(x="year",y="housing_units",figsize=(20,7),ylim=[370000,390000])
Question: What is the overall trend in housing_units over the period being analyzed?
Answer: # steady increase of housing units over the years
# Create a numerical aggregation that groups the data by the year and then averages the results.
sfo_data_grouped = sfo_data_df.groupby('year').mean()
# Review the DataFrame
sfo_data_grouped
| sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|
| year | |||
| 2010 | 369.344353 | 372560.0 | 1239.0 |
| 2011 | 341.903429 | 374507.0 | 1530.0 |
| 2012 | 399.389968 | 376454.0 | 2324.0 |
| 2013 | 483.600304 | 378401.0 | 2971.0 |
| 2014 | 556.277273 | 380348.0 | 3528.0 |
| 2015 | 632.540352 | 382295.0 | 3739.0 |
| 2016 | 697.643709 | 384242.0 | 4390.0 |
Question: What is the lowest gross rent reported for the years included in the DataFrame?
Answer: $ 1239
prices_square_foot_by_year by filtering out the “housing_units” column. The new DataFrame should include the averages per year for only the sale price per square foot and the gross rent.¶# Filter out the housing_units column, creating a new DataFrame
# Keep only sale_price_sqr_foot and gross_rent averages per year
prices_square_foot_by_year = sfo_data_grouped.drop(columns='housing_units')
# Review the DataFrame
prices_square_foot_by_year
| sale_price_sqr_foot | gross_rent | |
|---|---|---|
| year | ||
| 2010 | 369.344353 | 1239.0 |
| 2011 | 341.903429 | 1530.0 |
| 2012 | 399.389968 | 2324.0 |
| 2013 | 483.600304 | 2971.0 |
| 2014 | 556.277273 | 3528.0 |
| 2015 | 632.540352 | 3739.0 |
| 2016 | 697.643709 | 4390.0 |
# Plot prices_square_foot_by_year.
prices_square_foot_by_year.hvplot(ylabel="Gross Rent/ Price per Square foot")
prices_square_foot_by_year DataFrame and interactive plots to answer the following questions:¶Question: Did any year experience a drop in the average sale price per square foot compared to the previous year?
Answer: # yes 2011 was lower than 2010
Question: If so, did the gross rent increase or decrease during that year?
Answer: # It had increased
mean of the groups.¶# Group by year and neighborhood and then create a new dataframe of the mean values
prices_by_year_by_neighborhood = sfo_data_df.groupby(['year','neighborhood'],as_index=False).mean()
# Review the DataFrame
prices_by_year_by_neighborhood
| year | neighborhood | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|---|
| 0 | 2010 | Alamo Square | 291.182945 | 372560.0 | 1239.0 |
| 1 | 2010 | Anza Vista | 267.932583 | 372560.0 | 1239.0 |
| 2 | 2010 | Bayview | 170.098665 | 372560.0 | 1239.0 |
| 3 | 2010 | Buena Vista Park | 347.394919 | 372560.0 | 1239.0 |
| 4 | 2010 | Central Richmond | 319.027623 | 372560.0 | 1239.0 |
| ... | ... | ... | ... | ... | ... |
| 392 | 2016 | Telegraph Hill | 903.049771 | 384242.0 | 4390.0 |
| 393 | 2016 | Twin Peaks | 970.085470 | 384242.0 | 4390.0 |
| 394 | 2016 | Van Ness/ Civic Center | 552.602567 | 384242.0 | 4390.0 |
| 395 | 2016 | Visitacion Valley | 328.319007 | 384242.0 | 4390.0 |
| 396 | 2016 | Westwood Park | 631.195426 | 384242.0 | 4390.0 |
397 rows × 5 columns
sale_price_sqr_foot and gross_rent averages per year.¶# Filter out the housing_units
prices_by_year_by_neighborhood = prices_by_year_by_neighborhood.drop(columns='housing_units')
# Review the first and last five rows of the DataFrame
prices_by_year_by_neighborhood
| year | neighborhood | sale_price_sqr_foot | gross_rent | |
|---|---|---|---|---|
| 0 | 2010 | Alamo Square | 291.182945 | 1239.0 |
| 1 | 2010 | Anza Vista | 267.932583 | 1239.0 |
| 2 | 2010 | Bayview | 170.098665 | 1239.0 |
| 3 | 2010 | Buena Vista Park | 347.394919 | 1239.0 |
| 4 | 2010 | Central Richmond | 319.027623 | 1239.0 |
| ... | ... | ... | ... | ... |
| 392 | 2016 | Telegraph Hill | 903.049771 | 4390.0 |
| 393 | 2016 | Twin Peaks | 970.085470 | 4390.0 |
| 394 | 2016 | Van Ness/ Civic Center | 552.602567 | 4390.0 |
| 395 | 2016 | Visitacion Valley | 328.319007 | 4390.0 |
| 396 | 2016 | Westwood Park | 631.195426 | 4390.0 |
397 rows × 4 columns
sale_price_sqr_foot and gross_rent. Set the x-axis parameter to the year (x="year"). Use the groupby parameter to create an interactive widget for neighborhood.¶# Use hvplot to create an interactive line plot of the average price per square foot
# The plot should have a dropdown selector for the neighborhood
prices_by_year_by_neighborhood.hvplot(x='year',ylabel='Price (USD)',groupby='neighborhood')
Question: For the Anza Vista neighborhood, is the average sale price per square foot for 2016 more or less than the price that’s listed for 2012?
Answer: # Less
neighborhood_coordinates.csv file from the Resources folder into the notebook, and create a DataFrame named neighborhood_locations_df. Be sure to set the index_col of the DataFrame as “Neighborhood”.¶# Load neighborhoods coordinates data
csvpath=Path("./Resources/neighborhoods_coordinates.csv")
neighborhood_locations_df = pd.read_csv(csvpath,index_col="Neighborhood")
# Review the DataFrame
neighborhood_locations_df
| Lat | Lon | |
|---|---|---|
| Neighborhood | ||
| Alamo Square | 37.791012 | -122.402100 |
| Anza Vista | 37.779598 | -122.443451 |
| Bayview | 37.734670 | -122.401060 |
| Bayview Heights | 37.728740 | -122.410980 |
| Bernal Heights | 37.728630 | -122.443050 |
| ... | ... | ... |
| West Portal | 37.740260 | -122.463880 |
| Western Addition | 37.792980 | -122.435790 |
| Westwood Highlands | 37.734700 | -122.456854 |
| Westwood Park | 37.734150 | -122.457000 |
| Yerba Buena | 37.792980 | -122.396360 |
73 rows × 2 columns
sfo_data_df Dataframe, create a DataFrame named all_neighborhood_info_df that groups the data by neighborhood. Aggregate the results by the mean of the group.¶# Calculate the mean values for each neighborhood
all_neighborhood_info_df = sfo_data_df.groupby('neighborhood').mean()
# Review the resulting DataFrame
all_neighborhood_info_df
| year | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|
| neighborhood | ||||
| Alamo Square | 2013.000000 | 366.020712 | 378401.00 | 2817.285714 |
| Anza Vista | 2013.333333 | 373.382198 | 379050.00 | 3031.833333 |
| Bayview | 2012.000000 | 204.588623 | 376454.00 | 2318.400000 |
| Bayview Heights | 2015.000000 | 590.792839 | 382295.00 | 3739.000000 |
| Bernal Heights | 2013.500000 | 576.746488 | 379374.50 | 3080.333333 |
| ... | ... | ... | ... | ... |
| West Portal | 2012.250000 | 498.488485 | 376940.75 | 2515.500000 |
| Western Addition | 2012.500000 | 307.562201 | 377427.50 | 2555.166667 |
| Westwood Highlands | 2012.000000 | 533.703935 | 376454.00 | 2250.500000 |
| Westwood Park | 2015.000000 | 687.087575 | 382295.00 | 3959.000000 |
| Yerba Buena | 2012.500000 | 576.709848 | 377427.50 | 2555.166667 |
73 rows × 4 columns
neighborhood_locations_df DataFrame with the all_neighborhood_info_df DataFrame.¶Note that the first cell uses the Pandas concat function to create a DataFrame named all_neighborhoods_df.
The second cell cleans the data and sets the “Neighborhood” column.
Be sure to run these cells to create the all_neighborhoods_df DataFrame, which you’ll need to create the geospatial visualization.
# Using the Pandas `concat` function, join the
# neighborhood_locations_df and the all_neighborhood_info_df DataFrame
# The axis of the concatenation is "columns".
# The concat function will automatially combine columns with
# identical information, while keeping the additional columns.
all_neighborhoods_df = pd.concat(
[neighborhood_locations_df, all_neighborhood_info_df],
axis="columns",
sort=False
)
all_neighborhoods_df=all_neighborhoods_df.dropna()
# Review the resulting DataFrame
display(all_neighborhoods_df.head())
display(all_neighborhoods_df.tail())
| Lat | Lon | year | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|---|---|
| Alamo Square | 37.791012 | -122.402100 | 2013.000000 | 366.020712 | 378401.0 | 2817.285714 |
| Anza Vista | 37.779598 | -122.443451 | 2013.333333 | 373.382198 | 379050.0 | 3031.833333 |
| Bayview | 37.734670 | -122.401060 | 2012.000000 | 204.588623 | 376454.0 | 2318.400000 |
| Bayview Heights | 37.728740 | -122.410980 | 2015.000000 | 590.792839 | 382295.0 | 3739.000000 |
| Buena Vista Park | 37.768160 | -122.439330 | 2012.833333 | 452.680591 | 378076.5 | 2698.833333 |
| Lat | Lon | year | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|---|---|
| West Portal | 37.74026 | -122.463880 | 2012.25 | 498.488485 | 376940.75 | 2515.500000 |
| Western Addition | 37.79298 | -122.435790 | 2012.50 | 307.562201 | 377427.50 | 2555.166667 |
| Westwood Highlands | 37.73470 | -122.456854 | 2012.00 | 533.703935 | 376454.00 | 2250.500000 |
| Westwood Park | 37.73415 | -122.457000 | 2015.00 | 687.087575 | 382295.00 | 3959.000000 |
| Yerba Buena | 37.79298 | -122.396360 | 2012.50 | 576.709848 | 377427.50 | 2555.166667 |
# Call the dropna function to remove any neighborhoods that do not have data
all_neighborhoods_df = all_neighborhoods_df.reset_index().dropna()
# Rename the "index" column as "Neighborhood" for use in the Visualization
all_neighborhoods_df = all_neighborhoods_df.rename(columns={"index": "Neighborhood"})
# Review the resulting DataFrame
display(all_neighborhoods_df)
#display(all_neighborhoods_df.tail())
| Neighborhood | Lat | Lon | year | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|---|---|---|
| 0 | Alamo Square | 37.791012 | -122.402100 | 2013.000000 | 366.020712 | 378401.00 | 2817.285714 |
| 1 | Anza Vista | 37.779598 | -122.443451 | 2013.333333 | 373.382198 | 379050.00 | 3031.833333 |
| 2 | Bayview | 37.734670 | -122.401060 | 2012.000000 | 204.588623 | 376454.00 | 2318.400000 |
| 3 | Bayview Heights | 37.728740 | -122.410980 | 2015.000000 | 590.792839 | 382295.00 | 3739.000000 |
| 4 | Buena Vista Park | 37.768160 | -122.439330 | 2012.833333 | 452.680591 | 378076.50 | 2698.833333 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 64 | West Portal | 37.740260 | -122.463880 | 2012.250000 | 498.488485 | 376940.75 | 2515.500000 |
| 65 | Western Addition | 37.792980 | -122.435790 | 2012.500000 | 307.562201 | 377427.50 | 2555.166667 |
| 66 | Westwood Highlands | 37.734700 | -122.456854 | 2012.000000 | 533.703935 | 376454.00 | 2250.500000 |
| 67 | Westwood Park | 37.734150 | -122.457000 | 2015.000000 | 687.087575 | 382295.00 | 3959.000000 |
| 68 | Yerba Buena | 37.792980 | -122.396360 | 2012.500000 | 576.709848 | 377427.50 | 2555.166667 |
69 rows × 7 columns
points plot for the all_neighborhoods_df DataFrame. Be sure to do the following:¶geo parameter to True.size parameter to “sale_price_sqr_foot”.color parameter to “gross_rent”.frame_width parameter to 700.frame_height parameter to 500.# Create a plot to analyze neighborhood info
all_neighborhoods_df.hvplot.points(
'Lon',
'Lat',
geo=True,
size='sale_price_sqr_foot',
scale=1,
color='gross_rent',
title ='San Fran Rent prices and Price per square feet',
tiles=True,
frame_width=700,
frame_height=500
)
#all_neighborhoods_df.hvplot.labels(text='Neighborhood')
all_neighborhoods_df.sort_values(by='sale_price_sqr_foot')
| Neighborhood | Lat | Lon | year | sale_price_sqr_foot | housing_units | gross_rent | |
|---|---|---|---|---|---|---|---|
| 55 | Silver Terrace | 37.73467 | -122.40106 | 2014.00 | 170.292549 | 380348.00 | 3528.000000 |
| 22 | Hunters Point | 37.72551 | -122.37178 | 2012.50 | 170.624920 | 377427.50 | 2489.000000 |
| 2 | Bayview | 37.73467 | -122.40106 | 2012.00 | 204.588623 | 376454.00 | 2318.400000 |
| 44 | Outer Mission | 37.72280 | -122.43869 | 2013.50 | 242.370952 | 379374.50 | 2995.750000 |
| 63 | Visitacion Valley | 37.72874 | -122.41098 | 2014.50 | 301.466180 | 381321.50 | 3657.000000 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 67 | Westwood Park | 37.73415 | -122.45700 | 2015.00 | 687.087575 | 382295.00 | 3959.000000 |
| 47 | Pacific Heights | 37.79298 | -122.43579 | 2013.00 | 689.555817 | 378401.00 | 2817.285714 |
| 35 | Miraloma Park | 37.73415 | -122.45700 | 2011.75 | 779.810842 | 375967.25 | 2155.250000 |
| 33 | Merced Heights | 37.71993 | -122.46595 | 2014.00 | 788.844818 | 380348.00 | 3414.000000 |
| 61 | Union Square District | 37.79101 | -122.40210 | 2012.50 | 903.993258 | 377427.50 | 2555.166667 |
69 rows × 7 columns
Question: Which neighborhood has the highest gross rent, and which has the highest sale price per square foot?
Answer: # Westwood park has the highest gross rent, and union square has the highest sale price per square foot
Based on the visualizations that you have created, compose a data story that synthesizes your analysis by answering the following questions:
Question: How does the trend in rental income growth compare to the trend in sales prices? Does this same trend hold true for all the neighborhoods across San Francisco?
Answer: # They are positively correlated but this doesn't hold true for all of the neighborhoods.
Question: What insights can you share with your company about the potential one-click, buy-and-rent strategy that they're pursuing? Do neighborhoods exist that you would suggest for investment, and why?
Answer: # The strategy, looking at previous years data, would work well for most neighborhoods as rent rates have increased regardless of the sales price. This may be in an issue in some neighborhoods for if and when a client would want to exit and sell the property, but in theory this is a sound investment to hold.